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ABSTRACT 

The NucleaRDB is a Molecular Class-Specific 
Information System that collects, combines, valid- 
ates and disseminates large amounts of heteroge- 
neous data on nuclear hormone receptors. It 
contains both experimental and computationally 
derived data. The data and knowledge present in 
the NucleaRDB can be accessed using a number 
of different interactive and programmatic methods 
and query systems. A nuclear hormone receptor- 
specific PDF reader interface is available that can 
integrate the contents of the NucleaRDB with full- 
text scientific articles. The NucleaRDB is freely avail- 
able at http://www.receptors.org/nucleardb. 

INTRODUCTION 

Nuclear receptors (NRs) are ligand-inducible transcrip- 
tion factors that regulate processes, such as homeostasis, 
differentiation, embryonic development and organ physi- 
ology. A total of 49 human NRs have been identified (1). 
Their hgands are lipophilic compounds such as steroids, 
thyroid hormone, vitamin D3 and retinoids (2). The en- 
dogenous ligands are not yet known for 30% of the NRs 
(3). As nuclear receptors are involved in almost all aspects 
of human physiology and are implicated in many import- 
ant diseases including cancer, diabetes and osteoporosis, 
understanding of these receptors has major implications 
for human biology and for the development of new drug 
treatments. Nuclear receptors are targets for pharma- 
ceutical industries with similar importance (4), as the G 
protein-coupled receptors (GPCRs), ion channels and 
kinases. 

Due to the increasing amounts of experimental and com- 
putational data buried in numerous databases and scien- 
tific articles, the task of extracting, combining and 
validating this data is becoming an increasingly large 



hurdle for the individual scientist. Databases that revolve 
around a single protein family can help researchers in 
using all data needed for their research, while reheving 
them of the onerous tasks related to the retrieval of 
many data from different sources (5). 

The NucleaRDB is a data source that holds many dif- 
ferent data types (Table 1) in a well organized and easily 
accessible form (6). The data are vahdated, internally con- 
sistent and updated regularly. The NucleaRDB provides 
access to the data via various interfaces, which depending 
on the users' needs, are suited either for automated access 
or interactive usage. 

DATA CONTENTS 

Primary data 

The NucleaRDB contains three different primary data 
types: sequences, structures and mutations. Sequences 
and structures were updated as described previously (7). 
Mutation data was obtained from the Nuclear Receptor 
Mutation Database (8) and fully integrated in the 
NucleaRDB. In addition, a large body of mutations 
was extracted from hterature by the software package 
MuteXt (9). 

Computational data 

A large and diverse collection of computationally gene- 
rated data are present in the NucleaRDB. Multiple 
sequence ahgnments (MSAs) form the heart of the system 
and allow users to easily transfer information between dif- 
ferent proteins. MSAs are available for all families and 
subfamihes, and can be viewed using JalView (10) or can 
be directly downloaded in a number of formats. MSAs 
were created as described previously (7). 

Correlated mutation analyses (CMA) can be used to 
identify groups of residues that mutate in tandem. 
Residues that show correlated mutation behavior are 
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likely to be functionally related, and networks of those 
correlating residues indicate functional units (11). 
Correlation scores are available for all (sub-)families. 

The entropy and variabihty for a position in a MSA can 
be an indicator of the evolutionary pressures exerted 
at that position (12). Entropy and variabihty scores are 
available in tabular form and via an interactive page dis- 
playing an integrated view via plots, tables and structure 
models. 

In addition to the already large amount of structural in- 
formation that is present in the NucleaRDB, homology 
models based on multiple template structures have been 
built for all NRs. All structure models were built using 
YASARA (13) and are available for download or can be 
viewed directly using Jmol (14). 



INFORMATION RETRIEVAL 

All data in the NucleaRDB web interface are extensively 
connected, allowing for easy navigation between different 
data types. The main way of accessing the NucleaRDB's 
contents is via the hierarchical family tree. For each family. 



users can access the individual receptors, multiple sequence 
alignments (and all derived data and analyses such as cor- 
relation scores and protein distance networks), mutations, 
structures and models (Figure 1). All pages contain links 
to all related data and information. Extensive search 
facihties are available, allowing the search for proteins, 
sequences, structures, families and mutations using 
various search criteria and filters. A BLAST service is 
available that allows users to run their own sequences 
against the NucleaRDB. 

AU data types and search facihties are accessible from 
the web pages as well as from the web service endpoints, 
aUowing users to write workflows or in-house software 
that uses the NucleaRDB. 

Annotating scientific literature 

Utopia Documents (15,16) is a new PDF reader that offers 
unique opportunities to place information and knowledge 
in the context of scientific literature. We have integrated 
the NucleaRDB with the Utopia Documents PDF reader 
in such a way as to present to scientists, in a non-intrusive 
way, aU NR-relevant data and information discussed in an 
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Figure 1. Screenshot of the NucleaRDB family page. The family tree is shown on the left with the thyroid hormone family expanded. On the 
right-hand side, the data for the selected family is shown. 



Nucleic Acids Research, 2012, Vol. 40, Database issue D379 



In the crystal struct'. le, residues whose side-chains 
interact with other residues bv intramolecular hydrogen 
bonds are R274, ^275, W286, H305 and Q400. Guani- 
dine of R274 in'.-racts with T142 at the tail of HI by a 
hydrogen bonr which is conserved in NR subfamily 1 
except for P'^AR. We assumed that this bond is not 
necessarily essential because similar interaction was not 
observed ir PPAR in subfamily py-'"' and NRs in other 
subfamilie The hydroxyl group of S275 forms 

hydrogen bonds with indole of W286, imidazole of 
H305 an 1 the side chain carbonyl group of Q400. The 
Q400 si' .e chain also forms a hydrogen bond with the 
hydrox' 1 group of S306. We assumed that these hydro- 
gen be ids are not crucial for holding the 3-D structure 
of VER because the mutants of S275 and Q400, that is, 
S275, i and Q400A, had little effect on transactivation 
pote icy as described below. Thus, nine mutants, S237A, 
R274A, S275A, S278A, C288A, H305A, H397A,, 
Q400A and Y401A, are assumed to have little effect on 
the folding of the 3-D structure of VDR. Mutants of^ 
nonpolar residues, L233A, V234A and W286A, 
decrease hydrophobic interaction with the ligand and/or 
intramolecular interaction with other residues and 
increase the volume of LBP. 



R274A 




Mutation R274A in VDR HUMAN: 

This mutation has also been described in th« 
following literature: 

« 11425573 fPubmed) 
« lZ82971Q[Pubmed) 

More details for this mutation are available 

here. 

Effects: 

Activity induced by },25-(OH)2D3 completely 
abolished, activity induced by 22- 
Oxa-l,2S-fOHi2D3 completely abolished, activity 
induced by 20-Epi-1.25-(OH)2D3 completely 
abolished, activity induced by KH1060 completely 
abolished, important for the binding ofcalcitriol 
and probably also for the binding of the synthetic 
vitamin D analog MCI 2 88. 

Download YASARA scene of this mutation 



The NuclearDB contains Information about other 
mutations at the same position: 

• R274L 

General information about R2 74 in 
VDR HUMAN: 



RtsldutTypt Arg 
location H5 
NucleaRDB number 5S5 

For more information about this residue, take a 
look at this page. 

Download YASARA scene of this residue 




Figure 2. An impression of the Utopia Documents PDF reader interface to the NucleaRDB data. On the left-hand side a part of a scientific paper 
(17) is shown that is annotated by the NucleaRDB. Annotations are available for all the highlighted words. On the right-hand side an example of 
such an annotation (the mutation R274A) is displayed. 



Table 1. Contents of the NucleaRDB 



Proteins 3764 

Families 123 

Mutations 1543 

Protein structures 613 

Structure models 3764 
Residues 2 012 651 

Species 339 



troubles associated with navigating the many links between 
existing data and information available from the many 
articles in this field. The scientist neither struggles to get 
access to information related to topics within an article, 
nor is swamped by unnecessary information that still 
needs disambiguation; only data and information relevant 
to the topic of the article is made available. 



article at hand. Annotations are provided for proteins, 
residues and mutations mentioned in the PDF. For each 
of these concepts the annotations contain carefully selected 
information, as well as pointers to relevant web pages 
and related scientific literature. An example is shown in 
Figure 2. The PDF reader presents the scientist, in a non- 
intrusive way, all relevant data and information related to 
the topics discussed in the article. This alleviates the 



IMPLEMENTATION 

The data in the NucleaRDB is stored in a PostgreSQL 
(www.postgresql.org) relational database. The web service 
interface is developed with the Apache CXF (cxf.apa- 
che.org) web services framework. We offer both Simple 
Object Access Protocol and Representational state 
transfer endpoints. The web interface is built using the 
Apache Wicket (wicket.apache.org) web application 
framework. The database is accessed via a Hibernate 
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(www. hibernate. org) object-relational mapping layer. The 
server is running within Sun's Glassfish (www.glassfish. 
org) application server. 

CONCLUSION 

The NucleaRDB provides researchers with a single point 
of access for nuclear receptor-related data. Not only does 
the NucleaRDB hold a large amount of information, it 
also provides a broad scope of tools and dissemination 
facilities, relieving scientist of many of the tasks that 
come with collecting, vaHdating and integrating many 
diverse data. 
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